3 research outputs found
XGBOD: Improving Supervised Outlier Detection with Unsupervised Representation Learning
A new semi-supervised ensemble algorithm called XGBOD (Extreme Gradient
Boosting Outlier Detection) is proposed, described and demonstrated for the
enhanced detection of outliers from normal observations in various practical
datasets. The proposed framework combines the strengths of both supervised and
unsupervised machine learning methods by creating a hybrid approach that
exploits each of their individual performance capabilities in outlier
detection. XGBOD uses multiple unsupervised outlier mining algorithms to
extract useful representations from the underlying data that augment the
predictive capabilities of an embedded supervised classifier on an improved
feature space. The novel approach is shown to provide superior performance in
comparison to competing individual detectors, the full ensemble and two
existing representation learning based algorithms across seven outlier
datasets.Comment: Proceedings of the 2018 International Joint Conference on Neural
Networks (IJCNN
LSCP: Locally Selective Combination in Parallel Outlier Ensembles
In unsupervised outlier ensembles, the absence of ground truth makes the
combination of base outlier detectors a challenging task. Specifically,
existing parallel outlier ensembles lack a reliable way of selecting competent
base detectors, affecting accuracy and stability, during model combination. In
this paper, we propose a framework---called Locally Selective Combination in
Parallel Outlier Ensembles (LSCP)---which addresses the issue by defining a
local region around a test instance using the consensus of its nearest
neighbors in randomly selected feature subspaces. The top-performing base
detectors in this local region are selected and combined as the model's final
output. Four variants of the LSCP framework are compared with seven widely used
parallel frameworks. Experimental results demonstrate that one of these
variants, LSCP_AOM, consistently outperforms baselines on the majority of
twenty real-world datasets.Comment: Proceedings of the 2019 SIAM International Conference on Data Mining
(SDM